IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 7, JULY 2013 



2853 



Iterative LMMSE Channel Estimation and 
Decoding Based on Probabilistic Bias 

Keigo Takeuchi, Member, IEEE, Ralf R. Miiller, Senior Member, IEEE, and Mikko Vehkapera, Member, IEEE 



Abstract — Iterative channel estimation and decoding based 
on probabilistic bias is investigated. In order to control the 
occurrence probability of transmitted symbols, biased convo- 
lutional codes (CCs) are proposed. A biased CC is obtained 
by puncturing the parity bit of a conventional (unbiased) CC 
and by inserting a fixed bit at the punctured position when the 
state is contained in a certain subset of all possible states. A 
priori information about the imposed bias is utilized for the 
initial linear minimum mean-squared error (LMMSE) channel 
estimation. This paper focuses on biased turbo codes that are 
constructed as the parallel concatenation of two biased CCs 
with interleaving, and proposes an iterative LMMSE channel 
estimation and decoding scheme based on approximate belief 
propagation. The convergence property of the iterative LMMSE 
channel estimation and decoding scheme is analyzed via density 
evolution (DE). The DE analysis allows one to design the 
magnitude of the bias according to the coherence time, in terms 
of the decoding threshold. The proposed scheme is numerically 
shown to outperform conventional pilot-based schemes in the 
moderate signal-to-noise ratio (SNR) regime, at the expense of a 
performance degradation in the high SNR regime. 

Index Terms — Biased convolutional codes, linear minimum 
mean-squared error (LMMSE) channel estimation, iterative de- 
coding, belief propagation, density evolution. 



I. Introduction 

COHERENT detection has been used to meet demand 
for high-speed data transmission in wireless communi- 
cation systems [1], [2]. In coherent detection, the receiver 
first estimates channel state information (CSI) with known 
pilot signals sent from the transmitter, and then detects the 
transmitted data on the basis of the obtained CSI. As methods 
for channel estimation, two training-based schemes have been 
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commonly considered: One is based on time-division multi- 
plexed pilots (TDMPs) [3], and the other on super-imposed 
pilots (SIPs) [4], [5]. In the TDMP-based scheme, pilot and 
data symbols are transmitted on different time slots, so that 
there is no interference between the pilot and data symbols. 
A drawback of the TDMP-based scheme is a rate loss due to 
transmission of the pilot symbols. In the SIP-based scheme, 
on the other hand, a superposition of pilot and data symbols is 
sent in each time slot, which means that there is interference 
between the pilot and data symbols. This interference results in 
a performance degradation at the receiver. Furthermore, a part 
of the power is consumed for transmission of the SIP symbols, 
whereas there is no rate loss for the SIP-based scheme. 

In order to reduce the overhead for transmission of pilot 
symbols, iterative channel estimation and decoding has been 
considered [6]— [13]. In iterative channel estimation and decod- 
ing, tentative decisions fed back from the decoder are utilized 
to refine the initial channel estimates. Iterative channel estima- 
tion and decoding schemes can be systematically derived via a 
generic message-passing algorithm, called belief propagation 
(BP) [14]— [17]. Since the optimal minimum mean-squared 
error (MMSE) channel estimator is nonlinear and infeasible 
in terms of the computational complexity [6], [7], suboptimal 
linear channel estimators have been used instead, such as 
a least-squares channel estimator [9] and the linear MMSE 
(LMMSE) channel estimator [8], [10], [12], [13]. Iterative 
LMMSE channel estimation and decoding has been theoret- 
ically shown to provide a significant reduction of the pilot 
overhead compared to non-iterative channel estimation [13]. 

The aim of this paper is to propose a novel training- 
based scheme that is suitable for iterative channel estima- 
tion and decoding. In the proposed scheme, a bias of the 
occurrence probability of the data symbols is utilized as a 
priori information for training [18], [19]. In our previous 
work [20], we used a serial concatenation of a convolutional 
code (CC) and a biased block code with interleaving [21] 
to control the occurrence probability of the data symbols. 
Since increasing the code length of the biased block code 
results in an exponential growth of the decoding complexity, 
biased block codes with large code length could not be used. 
Consequently, the performance of the previous construction is 
much inferior to that of strong codes, such as turbo codes [22] 
and low-density parity-check (LDPC) codes [15]. In order 
to overcome this drawback, we propose biased CCs to be 
used as constituent codes of parallel concatenated codes with 
interleaving [23]. In this paper, this encoding scheme is called 
biased turbo codes. 

Density evolution (DE) is a powerful method for analyzing 
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the convergence property of BP-based iterative schemes [24], 
[25]. DE traces the evolution of the probability density 
function (pdf) of messages when the code length tends to 
infinity. Since the pdf is commonly intractable, Gaussian 
approximations have been proposed on the basis of entropy 
matching [15], [26], [27], bit error rate (BER) matching [13], 
[25], [28], mean matching [29], or signal-to-noise ratio (SNR) 
matching [30]. In this paper, we follow the reference [15] 
to present DE with the Gaussian approximation based on 
entropy matching for iterative LMMSE channel estimation and 
decoding. The presented DE analysis allows us to design the 
magnitude of the bias for biased turbo codes according to the 
coherence time. 

The rest of this paper is organized as follows: After sum- 
marizing the notation used in this paper, biased CCs are 
proposed in Section II. In Section III, we derive iterative 
LMMSE channel estimation and decoding for a biased-turbo- 
coded Rayleigh block-fading channel. In Section IV, we use 
the Gaussian approximation based on entropy matching to 
present DE analysis for the iterative channel estimation and 
decoding. In Section V, the bias-based scheme is numerically 
compared to pilot-based schemes in terms of BER. Section VI 
concludes this paper. 

Throughout this paper, D and F2 denote the delay operator 
and the binary field consisting of {0, 1}, respectively. For a 
complex number z G C and a complex vector v, z*, v T , and 
v H stand for the complex conjugate, the transpose, and the 
conjugate transpose, respectively. The Kronecker delta and the 
Dirac delta function are denoted by Sij and £(•), respectively. 
For a function p(x\y), p(x\y) oc f(x) means that p(x\y) is 
proportional to f(x), i.e. there is an x-independent constant 
C(y) such that p(x\y) — C(y)f(x). The real Gaussian 
distribution with mean p and variance a 2 is denoted by 
Af(fi,<7 2 ) whose pdf is written as pg(-; p, a 2 ). On the other 
hand, the proper complex Gaussian distribution with mean p 
and variance a 2 [31] is represented by CN(p, a 2 ) whose pdf 
is denoted by pcc(s ^, cr 2 ). 

II. Biased Convolutional Code 

A biased CC is obtained by modifying the output of a 
conventional (unbiased) CC: The output of the biased CC is 
set to a fixed value, regardless of the input, when the state is 
contained in a certain subset of all possible states. Otherwise, 
the output is equal to that of the corresponding unbiased CC. 
In this paper, we focus on systematic binary biased CCs with 
rate 1/2. It is straightforward to apply our construction to 
general biased CCs. 

We start by an unbiased CC with a generating rational 
function with memory m 

g(D)= ^° Pz q 0 = l, (1) 

with pi, qt G F 2 . Let bk € F 2 denote the unbiased information 
bit at time k. The parity bit u/, G F 2 of the unbiased CC at 
time k is given by 

m 

Uk = ^2pi<?k-i, (2) 

i=0 



where the fcth state variable G F 2 is updated as follows: 

m 

<?k = bk + ^qidk-i. (3) 

i=l 

Let <Tk-i = (cfe-i; • • • i &k-m) T € F™ denote the Tri- 
dimensional state vector just before the kth information bit 
bk enters the encoder. The parity bit of a biased CC is defined 
as follows: 

Definition 1 (Biased Convolutional Code). Let S p C F™ 

denote a subset of all possible state vectors. For fixed u p G F 2 , 

(u ) 

the parity bit u k G F 2 of the rate one-half systematic binary 
biased CC with the generating rational function (1) at time k 
is given by 

U K) = f u p for o-fe-x G Sp 
k \ Uk otherwise, 

where Uk denotes the parity bit (2) of the corresponding 
unbiased CC at time k. 

Let us calculate the probability with which the parity bit (4) 
takes Up under the assumption that the information bits are 
drawn uniformly and randomly. Since all state vectors occur 
with equal probability, except for initial times, we obtain 

Pr(4" p) = Up) =Pr(<7 fe _i G Sp) + ^Pr(<r fc _i $ Sp) 

which implies that the parity bit (4) is biased for |<S P | 7^ 0, 
whereas the systematic bits are unbiased. This a priori infor- 
mation (5) is utilized for channel estimation. 

The biased CC can be regarded as a CC with state- 
dependent puncturing and insertion of pilots: The known pilot 
bit Up is sent, instead of the parity bit (2), when the state 
vector is contained in S p . In contrast, the insertion position of 
pilots is state-independent for the TDMP-based scheme. State- 
dependent puncturing is regarded as a joint coding scheme for 
embedding training information into parity bit sequences. 

The parity bit of a biased CC is biased on long-term average. 
A simple method of eliminating this long-term bias would be 
to bit-flip the parity bit (4) for odd time k. For the parallel 
concatenation of two biased CCs with the same generating 
rational function (1) and S p , like turbo codes [22], another 
simple method is available: The parallel concatenation of two 
biased CCs with u p = 0 and u p = 1 makes the parity 
bits unbiased on long-term average, whereas the occurrence 
probability (5) of each parity bit is biased. In this paper, 
we refer to this parallel concatenation of two biased CCs 
as "biased turbo code." Throughout this paper, we consider 
biased turbo codes with rate 1/3. 

III. Iterative Channel Estimation and Decoding 
A. Overview 

We consider transmission of biased-turbo-coded data sym- 
bols with bit-interleaved coded modulation (BICM) over a 
frequency-flat fading channel. For simplicity, Gray-mapped 
quadrature phase shift keying (QPSK) and random uniform 
interleaving are used. Furthermore, a Rayleigh block-fading 
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Fig. 2. Turbo decoder. W2 (resp. 7r 2 x ) represents the interleaver (resp. de- 
interleaver). 



Fig. 1. Iterative LMMSE channel estimation and decoding. 7U (resp. w 1 1 ) 
represents the interleaver (resp. de-interleaver). 

channel with coherence time T c is assumed: The channel gain 
is kept constant during T c time slots, and at the beginning of 
the next fading block it is independently sampled from the 
circularly symmetric complex Gaussian (CSCG) distribution 
with unit variance. The received signal y t , n £ C at the tth 
time slot within fading block n is given by 

Vt,n = h n X t , n + W t ,n, W t ,n ~ CAf(0, N 0 ), (6) 

for t = 1, . . . ,T C and n = 1, . . . , N. In (6), h n ~ CAf(0, 1) 
denotes the channel gain in fading block n. Furthermore, x t „ 
represents the biased-turbo-coded QPSK data symbol with 
E[|a;t.n| 2 ] = 1 at the tth time slot within fading block n. 
The random variable w t: n denotes the additive white Gaussian 
noise (AWGN) with variance N 0 . Let M denote the code 
length of the biased turbo code. The use of QPSK implies 
that the number of fading blocks is given by N = M/(2T C ). 

Figure 1 shows the proposed iterative receiver based on 
BP [14], [15]. In the initial iteration, the LMMSE channel 
estimator uses the a priori information (5) about the bias of 
the data symbols to estimate the channel gains {h n }. The 
obtained CSI is sent to the demodulator, which detects the 
data symbols {xt, n }- More precisely, the demodulator calcu- 
lates the extrinsic probability of each data symbol. After de- 
interleaving, the obtained probabilities are fed forward to the 
turbo decoder, which calculates estimates for the systematic 
and parity bits in the form of extrinsic log-likelihood ratios 
(LLRs). The obtained estimates are interleaved and re-mapped 
to refine the initial channel estimates. After several iterations 
of the iterative channel estimation and decoding, the receiver 
obtains the decisions of the information bits. 

B. Turbo Decoder 

The turbo decoder consists of two maximum a posteriori 
(MAP) decoders for the biased CCs, as shown in Fig. 2. 
The MAP decoding for the two biased CCs can be effi- 
ciently performed by the Bahl-Cocke-Jelinek-Raviv (BCJR) 
algorithms [32], so that the two MAP decoders are called 
"BCJR decoders" in this paper. We refer to the BCJR decoder 
for the biased CC with u p = 0 (resp. u p = 1) as BCJR 
decoder 0 (resp. decoder 1). BCJR decoder 0 calculates the 
extrinsic LLRs of the systematic bits bk by utilizing the 
extrinsic probabilities of the data symbols sent from the 
demodulator as a priori information. BCJR decoder 1 uses the 
extrinsic LLRs calculated in BCJR decoder 0 and the a priori 



information provided from the demodulator to perform the 
BCJR decoding. The obtained extrinsic LLRs of the systematic 
bits are fed back to BCJR decoder 0 to refine the initial 
estimates. After several iterations, the turbo decoder obtains 
estimates for the systematic bits. In the last iteration, the 
extrinsic LLRs of the parity bits u k °^ and u k X \ given by (4), 
are also calculated in the two BCJR decoders. We write the 
extrinsic LLRs of the systematic bit bk and the parity bit 
u j> P ) a £ ter j ca i cu i a tions of BCJR decoding in iteration i for 
the iterative channel estimation and decoding as L k ' s \j) and 
L k ' Up \j), respectively. The sign of the LLRs is taken such 
that a positive LLR implies that bit 0 is more likely than bit 1. 
The turbo decoder is basically the same as the iterative decoder 
for the conventional turbo codes [22]. The only difference is 
that the BCJR algorithm for the biased CCs is used, instead 
of that for unbiased CCs. See [33] for the details. 

We refer to iterations in the turbo decoder and the iterative 
channel estimation and decoding as inner and outer iterations, 
respectively. The number of outer iterations is denoted by I. 
On the other hand, the numbers of inner iterations in outer 
iteration i is written as J^. 

C. Soft Symbol Mapper 

Let P«(x t ,„) - pW(X[xt, n ])P®($S[xt, n ]) denote the a 
priori probability of the data symbol Xt, n passed from the 
soft symbol mapper to the LMMSE channel estimator in outer 
iteration i. When the systematic bit bk € F 2 is mapped to the 
real part of x t . n with bit-interleaving and the standard mapping 
$t[x t ,n] = (l-26 fc )/v / 2, the a priori probability P« (Sft[x t ,„]) 
is given by 

... / 1 \ e i^ s) (J0/2 
P W Stlxt „] = -=)= —r-. . (7) 

V V2J e L^">(Ji)/2 + e -L<*-'>(.7«)/2 

The a priori probability pW (3[x ti „]) is calculated in the same 
manner as for (7). 

Let x[^ n € C denote the soft decision of the data symbol 
x t .n based on the a priori probability P^^(x ti „), given by 

= xt, n P (i) (x t , n ). (8) 

{*«,«} 

The a priori probability (7) implies that, when bk is mapped 
to 3?[x ti „], the real part of (8) is given by 

^«] = ^tanhf^i^V (9) 
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D. LMMSE Channel Estimator and Demodulator 

Since block-fading has been assumed, the LMMSE channel 
estimation is performed block by block. We focus on time 
slot t within fading block n. The LMMSE channel estimator 
calculates an approximate extrinsic pdf Pt^ (h n ) of the 
channel gain h n given {ye.n ■ t' = l,...,T c ,t' ^ t} and 
{pW(xt>, n ) : t' = 1,. ..,T c ,t' + t}. The demodulator uses 
the received signal y t „ and the extrinsic pdf Pt'n^C 1 ") t0 
compute the extrinsic probability Q^ +1 \x t . n ) of the data 
symbol x t , n , defined as 

Q (t+1 \x t ,„) oc / pcG(yt,n;h n x t ,„,N 0 )p\^ 1 \h n )dh n , 

(10) 

where the proper complex Gaussian pdf pcG(yt,n] h n x t , n ,No) 
represents the fading channel (6). 

The reason why y t , n and P^(x t , n ) for time slot t are 
excluded in calculating P^n^i^n) is due to the definition 
of BP [15]. This exclusion stabilizes the convergence of the 
iterative receiver, and allows us to perform DE analysis for a 
sufficiently long length of interleaving. 

We shall derive the extrinsic pdf p^ 1 ^ (h n ) for the LMMSE 
channel estimator, following [13], [19], [34]. We first decom- 
pose the first term on the right-hand side (RHS) of (6) into 
two terms, i.e. 



Vt'.n = Kx^n + K(Xt',„ ~ X^) n ) + W t >,„, 



(ID 



for t' ^ t, where x^) n denotes the soft decision (8) in outer 
iteration i. Next, the second term on the RHS of (11) is 
approximated by an independent CSCG random variable C^ n 

with variance 1 



c { ? I 



, which is the same as the variance 



of the original term given x t , . The obtained approximate 
channel can be regarded as a Gaussian single-input multiple- 
output (S1MO) channel with Gaussian input h n ~ CA/"(0, 1) 



and AWGN <*>',„ +W,n ~ CAf(0, N 0 + l- \x$> n \ 2 ) [2]. Thus, 
the extrinsic pdf p^ 1 ^ {h n ) for the LMMSE channel estimator 
is given as the a posteriori pdf of h n given {yt'.n ■ t' ^ t} 
and {xf) n :t'^t} in this SIMO channel, 



Pi 



,„ {h„)=PCG\hn]hln ,Q,n )■ 



(12) 



In (12), the LMMSE estimate and the mean- squared 



error (MSE) are given by 



St,n 



°yt> 



c=W + 1 



|a(0 12 ' 
\ x t'n\ 



|a(0 12 



t' = l,tV* 



N 0 + 1 - 15 ; 



(i) 



(13) 



(14) 



,n I 



respectively. See Appendix A for the details of the derivation. 
For (12), the extrinsic probability (10) reduces to 

Q (i+1) (a; t , n ) cxpcc (vtrf^xt,*,^ +N 0 ) . (15) 

In the initial outer iteration, the a priori information (5) 
about the bias is used to define the a priori probability 

P<°'(a; t , n ): The initial LLRs L { *' s) (0) for the systematic bits 



are equal to zero for all i, since the systematic bits are 
unbiased. On the other hand, the initial LLRs L^' Up \o) for 

(0) = (1 - 2u p )L in it, given by 



the parity bits are set to L^' Up ^ 



In 



1 



1 - 2~ m \S p \ 



(16) 



where we have used the a priori information (5) about the bias. 
These initial LLRs are used to define the a priori probability 
P(°\x t ,n). If the parity bits were unbiased, the initial soft 
decision x[°l, given by (8), would be zero. Consequently, the 
LMMSE channel estimate (13) would be independent of the 
received signals. However, x[°} t can be non-zero for biased 
turbo codes, so that the LMMSE channel estimator can obtain 
a rough estimate of h n in the initial outer iteration. 

IV. Density Evolution Analysis 
A. Iterative Channel Estimation and Decoding 

Under the assumption of random uniform interleaving, we 
present DE with the Gaussian approximation in the limit where 
the code length tends to infinity. For simplicity, the coherence 
time T c is assumed to be sufficiently long. Mathematically, 
the limit T c — > oo is taken after the infinite code length limit. 
Note that the two limits do not commute with each other. 

Let us analyze the convergence property of the iterative 
LMMSE channel estimation and decoding. Expression (15) 
implies that each data symbol Xt, n can be regarded as if it 
were transmitted over a virtual fading channel with perfect 
CSI at the receiver, 



Ut,n 



' l t,n x t,n T W t n , 



Wt ( : n +1) -cAT( 0 ,d:„ +i) +iVo). 

(17) 

It is straightforward to find that h^ 1 ^ given {£^„} follows 

the CSCG distribution with variance 1 - In fact, the 

assumptions of Rayleigh fading and QPSK imply that the true 
received signal y ti „ given by (6) follows a CSCG distribution, 
so that the LMMSE estimate (13) conditioned on {x[9 n } is 
also CSCG. Furthermore, the uncorrelation property between 



the LMMSE estimate h 



l t,n 



and the estimator error h n — 



implies that the variance of h[ 1 ^ is equal to l—£}tZ 



because of E[|/i„| 2 ] = 1. Thus, we find h^ 1} - CAf(0, 1 - 

St,n )• 

The observations above imply that the convergence property 
of the iterative LMMSE channel estimation and decoding is 
characterized by the MSE (14) or equivalently the effective 
SNR i^ 15 ), given by 



7(0 



(18) 



The effective SNR is a function of the MSE £ t ( ^~ 1} , which is 
a random variable that depends on When the code 

length tends to infinity, the soft decisions {cc^ }, given by 
(8), are expected to be independent for all t' and n [35]. The 
weak law of large numbers implies that the MSE (14) can be 
approximated by a deterministic value 

1 



(i+l) 



1 + TcT^+l) 



(19) 
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for sufficiently large T c , with 



,W 12 
H,n\ 



N 0 + 1 - |x 



(Oi 



(20) 



where the expectation is taken with respect to x 



In order to evaluate the expectation on the RHS of (20), 



we first present several definitions. Let p 



and 



P r (i,«i>)/ \ i (u D )(-|-) denote the conditional pdfs of the LLRs 

it U)l«fc 



L^' s \j) and L^'^'ij) given and m^," p; , respectively. Since 
the conditional pdfs are identical for all fc when the code length 
tends to infinity, we drop the subscripts k to re-write them as 

PL^)(j)\b and Pi,(*.«p)(j)| u («p)- The set of tne six conditional 
pdfs for the LLRs is denoted by 

= 0>x,(i.B) (i) |e,(-|0), p L (m) (j -)| 6 (-|1), 
Pl(>.») C0|«(°) ("I 0 )) -Pl(-.o) (j) |u(") (• I !) 
Pi(*.i)(j)|«^("l 0 )' Plc-^O^IuwOII)}- (21) 

Furthermore, we define weight functionals of p e V^(j) as 

1 

2 



-(i,u p ) 



(iip) 



for 6 = 0, 1, 



(22) 



W(p i( M) (j )| 6 (-|&)) 

W(Pl(«.«p) W )|«(«p5 (■!«)) ^ U 1 + ( 2<5 «'«p - J ) W 

'(23) 

The latter corresponds to the occurrence probability (5) of the 
parity bits. Finally, we define a functional r](p, q) of two pdfs 

p(-) and q(-) as 



v(p, <l) 



[ 



\x\ 2 (L T ,Li) 



iY 0 + l-|i| 2 (i r ,Li) 



p{L t )q{Li)dL Y dLi, 



(24) 

where \x\ 2 (L r , Li) denotes the squared soft decision of x t „ 
for the case where the extrinsic LLRs for 3?[a; t ,„] and 9[ir t ,„] 
are given by L r and Li, respectively, 



\x\ 2 (L r , Li) = X - tanh 2 J + 1 tanh 2 



(25) 



Note that (24) is equal to (20) when the two pdfs p(-) and 
q(-) coincide with those for the LLRs corresponding to the 

(i) 

real and imaginary parts of x\ n , respectively. 

We next calculate the expectation on the RHS of (20). 
The assumption of random uniform bit-interleaving implies 
that the systematic and parity bits for the two biased CCs 
are mapped to 3?[x t ,„] uniformly and randomly. Since the 
occurrence probabilities of the parity bits are biased, as shown 
in (5), the pdf p(L T ) of the LLR corresponding to SFt^ 1 ^] is 
given by 

P(Lr) = l E W ^ W 

with (21), (22), and (23). Repeating the same argument for 
the imaginary part 3[a; t .„], we find that (20) reduces to 



1 



(i+l) = L 



]T W(q)W(q)rj(p,q), (27) 

•P«(Ji) 

where we have used the linearity of the functional (24). 
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Finally, let us calculate r/ 1 ) for the initial iteration. For 
i = 0, we have 

Pno.. HJi) \ b (L\-)=S(L), (28) 

Pi(0.«p)(J 4 )|«^p) ( L l') = ^ - (1 - 2Up)i in it) , (29) 

where L init is given by (16). Applying (28) and (29) to (27) 
with 2 = 0, after some algebra, we obtain 



-2m 



\s P 



9 V2(A^o + l)-2- 2m |5 p | 2 



-2m 



Nn 



l \s P \ 



(30) 



B. Turbo Decoder 



We shall evaluate the six conditional pdfs contained in (21). 
The probabilistic bias of the parity bits breaks the symme- 
try p L ( t ,„p) 0 . )|u („ p ) (L|0) = p i (i,„p) (j . ) | u ( Up )(-L|l), whereas 
PL(M)(j)|b( L |0) = p L a,s) mb (-L\l) holds approximately, 
as shown in Fig. 3. Thus, we assess the five conditional 
pdfs obtained by excluding s)(j)i£,(-|l) from (21). This 
approximation for the systematic bits reduces the degrees of 
freedom of DE equations. Without it, we would obtain coupled 
DE equations with respect to the two pdfs for the systematic 
bits. However, the approximation reduces the coupled DE 
equations to a single DE equation with respect to the pdf 

PU*-°>(j)\b( L \ 0 )- 

Since the conditional pdfs are intractable in general, we use 

the Gaussian approximation to replace the pdfs by tractable 

pdfs. For that purpose, let us consider the binary-input AWGN 

channel with the standard mapping 0h1 and 

y = x + w, w~Af(0,v), (31) 

with the input x € {1,-1}. It is known that the corresponding 
LLR L is given by 

p(y\x = l) _2y 

V 



L = In- 



(32) 



p(y\x = -1) 

which implies that the LLR (32) given x follows the Gaussian 
distribution Af(2x/v,4/v). Thus, the pdf of the LLR for the 



2858 



IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 7, IULY 2013 



binary-input AWGN channel is characterized by the single 
parameter v. In the Gaussian approximation, the pdfs of the 
LLRs are approximated by this restricted class of pdfs. Note 
that the Gaussian approximation is a rough approximation for 
the conditional pdfs, since the true conditional pdfs are not 
symmetric about their means, as shown in Fig. 3. However, 
the Gaussian approximation can provide a reasonably good 
approximation for the MSE (19). 

There is ambiguity in how to determine the parameter v. 
In this paper, we follow the reference [15] to use entropy 
matching: The variance parameter v is determined such that 
the entropy of the a priori probability (7) averaged over the 
true pdf of the LLR coincides with that averaged over the 
restricted pdf. Let v^' Up \j) denote the variance parameter 



corresponding to the conditional pdf p L a, Up ) 



)(-\u) for 



Up = 0, 1 and u = 0,1. Furthermore, the corresponding 
entropy is written as H^' Up \j) = Hri(v^'' Up \j)), given by 



H G (v) 



S 



e L/2 \ / 2 4 X 
c l/2 + c -l /2 Po dL. (33) 



In (33), Pg{-', 2/u,4/u) stands for the real Gaussian pdf with 
mean 2/v and variance A/ v. Furthermore, S(p) denotes the 
binary entropy function 



S(p) = -plog 2 p- (1 -p)log 2 (l -p). 

(»,it P 



(34) 



The variance parameter i;-' p (j) is determined such that the 
entropy Hq(v^ ' Up \j)) coincides with the true one, 

e L/2 \ 

S ' c L/2 +c -L/2 ) Plw.«p)(,-)|u(«p) W*)dL, (35) 



■/*( 



where the RHS is estimated from numerical simulations of the 
BCJR decoding. 

Let v^ s ~>(j) and H^^(j) = H G (v^(j)) denote the 
variance parameter and the entropy corresponding to the con- 
ditional pdf PL(i,8)(j)|6(-|0), respectively. As seen from Fig. 2, 
the variance v^' s ^(j) or equivalently the entropy H^ l ' s \j) 
satisfies the following DE equation: 

H^ s \j)=T s (H^ s \j-iy n (^)), H«ri(p) = l, (36) 

where 7 (£ (i) ) denotes the effective SNR (18). In (36), r s (-; •) 
is a function that is determined from g(D), u p , and S p 
for the used biased CC. Since r s (-; 7 (£ (l) )) for fixed 
is a monotonic function, there exists the inverse function 
r-^sT^W)) that satisfies 

^)( j -l) = r s - 1 (F^)(j); 7 (^))). (37) 

The function r s (-; •) can be estimated numerically: We con- 
sider the biased-convolutional-coded Rayleigh fading chan- 
nel (17) with BICM, and assume that the LLRs for the 
systematic bits follow independent and identically distributed 
(i.i.d.) Gaussian distributions with mean 2/Hq 1 (H^^ (J — l)) 
and variance 4/^Q 1 (i?( i ' s )(j- 1)), with #g (■) denoting the 
inverse function of (33). The entropy H^' s ^(j) is estimated 
from numerical simulations of the BCJR decoding with perfect 
CSI at SNR 7 (£ w ). 



The entropies {H^' Up \j)} for the parity bits are given by 



(H^ Up \j),H^ Up \j)) = T Up (H^\j l); 7 (£ (i) )), (38) 

for Up = 0, 1, where r Up (-; •) is numerically estimated in the 
same manner as for T s (-; •). Note that {H^' Up \j)} for fixed 
£W are uniquely determined from the entropy H^ l ' s \j) for 
the systematic bits via (37). 

In the Gaussian approximation with entropy matching, the 
six conditional pdfs in (21) are approximated by the six 
Gaussian pdfs 



PG ■ 



v (i,s)(jy v (*,B)(j) J ,PG \ ' v (i,B)(jy 

2 4 \ / -2 4 \ 



PG 



PG ■ 



,PG 



,PG 



(t,0)/.x' (t,0)/.x 



(i,l)/.x' {i,V)f\ 



fe9) 



with vt Up) (j) = H G \H^ Up, (j)) for u = 0,1 and u p = 
0, 1, and with v^ij) = H^iH^ (j)). 

C. DE Equations 

In summary, we have derived DE equations with respect to 
the MSE £W and the entropy i/^' s ^(Jj) for the systematic 
bits, 

(Ji) = *(fW), (40) 
f(<+i) =^(H^' s \ji)), (41) 

where the initial MSE £W is calculated with (19) and (30). 
The maps on the RHSs of (40) and (41) are evaluated as 
follows: 

1) Let i = 1 and calculate £W with (19) and (30). 

2) Iterate (36) with ^ to evaluate H^(Ji). 

3) Calculate {H? '" p) ( Ji)} from H^' s \j-), with (37) and 
(38), for u = 0, 1 and u p = 0, 1, and use (39), instead 
of (21), to evaluate (27) with (22), (23), and (24). Then, 
substitute t/^ 1 ) into (19) to calculate If i < I, 
go back to Step 2) after i := i + 1. 

D. Decoding Thresholds 

We first investigate the convergence property of biased turbo 
codes under the assumption of perfect CSI available at the 
receiver. The function r s (-; •) in (36) can depend not only on 
the cardinality |<S P | but also on the subset itself S p . However, 
by an exhaustive search considering all subsets of size up to 
3, we observed that r s (-; •) is not sensitive to the choice of S p 
and that it depends only on the cardinality |<S P | approximately. 
In this paper, a subset S p is used such that the biased CC is 
implemented with a small number of logic gates. Table I lists 
the subsets S p for 16 state codes used in this paper. Each 
element of S p C F| is denoted by a hexadecimal number. For 
example, 0, 3, 12, and 15 for \S P \ — 4 indicate that S p consists 
of the four vectors (0, 0, 0, 0) T , (0, 0, 1, 1) T , (1, 1, 0, 0) T , and 
(1,1,1,1) T . 
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TABLE I 



|Sp| 


<s P 


|Sp| 


s P 


1 


15 


5 


8, 9, 10, 12, 15 


2 


13, 15 


6 


3, 5, 7, 11, 13, 15 


3 


10, 14, 15 


7 


9, 10, 11, 12, 13, 14, 15 


4 


0, 3, 12, 15 


8 


8, 9, 10, 11, 12, 13, 14, 15 




0.3 0.4 0.5 0.6 0.7 
Entropy for systematic bits 



Fig. 4. Evolution of ii"^' 8 ) (j) with respect to j for 7(£ (i) ) = -0.3 dB 
and 9 (.D) = (l + D 4 )/£L 0 />. 



Figure 4 shows the evolution of the entropy (36) for the sys- 
tematic bits, called the extrinsic information transfer (EXIT) 
chart [26]. The zig-zag line, starting from i/( l,s )(0) = 1, 
represents the trajectory of the entropy for the biased turbo 
code with |<S P | = 5. Note that the zero entropy H^^(j) = 0 
corresponds to zero BER. The two functions T s (-; 7(£ (i) )) 
and r~ 1 (-; 7(£^)) have an intersection at the origin for 
fixed 7(£^). This corresponds to an intuition that the BCJR 
decoding should result in perfect decoding of the systematic 
bits, when the true values of the systematic bits are available 
as a priori information. We find that the entropy H^ l ^(j) 
monotonically decreases with the increase of j, and converges 
toward 0 as j — ¥ oo, since the origin is the unique intersection 
of the two functions. If they had another intersection with 
strictly positive entropy, H^ l ' s \j) could not converge to zero 
as j — > oo. The two functions get closer to each other when 
|5 P | increases or when the effective SNR 7(£^) decreases. 
Thus, we define the decoding thresholds of biased turbo codes 
in terms of the effective SNR (18) as follows: 

Definition 2 (Threshold of Biased Turbo Code). The decoding 
threshold of a biased turbo code is defined as the infimum of 
7 > 0 such that the fixed-point equation H — T S (H; 7(£^- 1 )) 
for H £ [0, 1] has the unique solution at H = 0 for all 

7 (£ (i) )>7- 

Definition 2 implies that the turbo decoder with infinite 
iterations can achieve zero BER in the infinite code length 
limit, when the effective SNR 7(£^) is larger than the 
decoding threshold. Otherwise, the BER converges to a strictly 
positive value after infinite iterations. 

We next focus on the convergence property of the iterative 
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Fig. 5. Evolution of (f W , H^'^ (J<)) with respect to i for 1/N 0 = 0.2 dB, 
T c = 40, Ji = 20, |5 P | = 5 (Pr(n^ p) = u p ) = 21/32), and g(D) = 
(l + £ 4 )/Eto^- 




Fig. 6. Evolution of (£W , H^'^ (Ji)) with respect to i for 1/N 0 = 
0.05 dB, 77 = 40, Ji = 20, \S P \ = 5 (Pr(n^ p) = u p ) = 21/32), 



and<,(D) = (l + D 4 )/£to^- 



LMMSE channel estimation and decoding. Figures 5 and 6 
show the evolution of H^ l,s \Ji)) with respect to i for 
l//Vo = 0.2 dB and 1/N 0 = 0.05 dB, respectively. The solid 
curves represent the function <&(•) given by (40), whereas the 
dashed curves show the function <&(•) given by (41). The 
trajectories of (£W , H^ l - S ' (Ji)) are shown by the zig-zag lines. 
In Fig. 5, we find that the MSE and the entropy monotonically 
decrease as i grows, and that the entropy converges toward a 
value close to zero, whereas the MSE tends to a positive value 
distinct from zero. When the number of inner iterations tends 
to infinity, the entropy would converge to zero as i — > oo. 
In Fig. 6, on the other hand, the two functions (40) and (41) 
have three intersections. As a result, £f(vO(Jj)) tends to 
the fixed-point to (40) and (41) that has the maximal entropy. 
These observations let us define the decoding threshold of the 
iterative LMMSE channel estimation and decoding as follows: 

Definition 3 (Threshold of Iterative LMMSE Channel Estima- 
tion and Decoding). The decoding threshold of the iterative 
LMMSE channel estimation and decoding is defined as the 
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TABLE II 

Decoding thresholds of the iterative LMMSE channel 

ESTIMATION AND DECODING FOR g(D) = (1 + D 4 )/ J2i=0 . 





T c -> oo 


T c = 60 


T c = 40 


T c =20 


|5 P |=4 


-0.59 dB 


-0.22 dB 


0.13 dB 


2.09 dB 


|5 P | =5 


-0.40 dB 


-0.07 dB 


0.11 dB 


1.24 dB 


|5 P | = 6 


-0.05 dB 


0.20 dB 


0.34 dB 


0.80 dB 



infimum of SNR > 0 such that the DE equations (40) and 
(41) for £M G [0,1) have the unique fixed-point with zero 
entropy in the limit Jj — > oo for all 1 /No > SNR. 

Note that the decoding threshold defined above depends 
on the coherence time T c . When T c — > oo, the decoding 
threshold reduces to that for the turbo decoder, defined in 
Definition 2, since the LMMSE channel estimator can attain 
perfect CSI. Table II lists the decoding thresholds of the 
iterative LMMSE channel estimation and decoding. We find 
that, as the coherence time T c decreases, |<S P | or equivalently 
the magnitude of the bias (5) should be increased to obtain 
accurate channel estimates. For T c = 40, the biased turbo 
code with |5 P | =5 minimizes the decoding threshold of the 
iterative LMMSE channel estimation and decoding among all 
possible cardinalities |5 P |. 

V. Numerical Simulations 

The proposed scheme based on the biased turbo code with 
rate r = 1/3 is compared to two pilot-based schemes with 
iterative LMMSE channel estimation and decoding. Unbiased 
turbo codes were used for the two pilot-based schemes. In 
all numerical simulations, g(D) = (1 + D*)/^2*_ 0 D % was 
applied, which is a good constituent code of t/nbiased turbo 
codes with 16 states [15]. In the TDMP-based scheme, r 
QPSK pilots with unit power and (T c - r) coded QPSK data 
symbols with unit power are sent during each fading block. 
In order to match the overall rates between different schemes, 
we used a turbo code with rate r/(l — r/T c ) for the TDMP- 
based scheme. The turbo code is obtained by puncturing the 
parity bits uniformly and randomly. In the SIP-based scheme, 
on the other hand, the transmitted symbol xt, n is given by the 
superposition (d t . n + &Pt,n)/Vl + ct 2 of a coded QPSK data 
symbol rf t „ with unit power and a QPSK pilot pt >n with unit 
power. The coefficient (1 + a 2 ) -1 / 2 normalizes the power 
of the transmitted symbol xt, n to 1. The overall rate of all 
schemes is equal to 2/3 bits/s/Hz, so that the receive SNR 
per information bit is E^/No = 3/(2N 0 ) for all schemes. 

The BERs of the three schemes are shown for T c = 40 
in Fig. 7. For the bias-based scheme, we used |<S P | = 5 that 
achieves the best decoding threshold for T c = 40, as shown in 
Table II. For the pilot-based schemes, on the other hand, we 
used the parameters r and a that minimize the SNRs E^/Nq 
required for achieving a BER level between 10~ 3 and 10~ 4 , 
respectively. The same criterion is used for the other figures in 
this paper. "Perfect CSI" indicates the BERs after 20 iterations 
in the turbo decoder with perfect CSI. Thus, these bounds for 
the TDMP-based scheme and the SIP-based scheme include a 
rate loss and an SNR loss due to transmission of pilot symbols, 
respectively. We find that the bias-based scheme can achieve 
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Fig. 7. BER versus E b /N 0 for T c = 40, M = 12240 (N = 153), 
I = 10, Ji = 4 for i < I, Jio = 20, and g(D) = (1 + D 4 )/ J2t=o D< ''■ 
The parameters \S P \ = 5 (Pr(«^ p) = u p ) = 21/32), r = 2, and a = 0.32 
were used for the biased turbo code, the TDMP-based scheme, and the SIP- 
based scheme, respectively. 



the best performance in the moderate SNR regime, whereas 
it is inferior to the pilot-based schemes in the high SNR 
regime. A performance degradation in the high SNR regime is 
because the biased turbo code should have smaller minimum 
distance than the corresponding unbiased turbo code. It is 
worth focusing on the gaps between the actual BERs and the 
perfect CSI bounds. The TDMP-based scheme has the largest 
gap among the three schemes, since it is impossible to adjust 
the discrete parameter r flexibly. On the other hand, the SIP- 
based scheme allows one to adjust the continuous parameter 
a. Consequently, the gap for the SIP-based scheme is smaller 
than for the TDMP-based scheme. However, the gap is not 
as small as expected in spite of much pilot overhead, i.e. 
a = 0.32, since there is interference between the data and 
pilot symbols. The biased turbo code mitigates interference 
between data and pilot, by coding data and pilot jointly. As 
a result, the bias-based scheme can achieve the smallest gap 
between the actual BER and the lower bound. 

Figure 8 shows the MSEs for the LMMSE channel esti- 
mation versus the number of outer iterations. In the initial 
iteration, the MSE of the bias-based scheme is worst among 
the three schemes. As the outer iteration proceeds, however, 
the MSE decreases quickly, and tends toward almost the same 
value as the MSEs of the other schemes. Note that, after 
10 outer iterations, the bias-based scheme can achieve the 
best BER for E h /N 0 = 4.0 dB, as shown in Fig. 7. This 
observation is due to the overhead for training: For the TDMP- 
based scheme, a part of the parity bits are punctured to insert 
the pilots. The SIP-based scheme requires a power loss for 
transmission of the SIPs. Furthermore, the superposition of the 
data and pilot symbols causes additional interference unless 
perfect CSI is available. On the other hand, the bias-based 
scheme needs no overhead for training. Consequently, the bias- 
based scheme can achieve the best BER for fixed quality of 
channel estimation. 

The BERs of the three schemes for T c = 60 are shown in 
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Fig. 8. MSE versus the number of outer iterations for E\>/Nq = 4.0 dB, 
T c = 40, M = 12240 (N = 153), J; = 4 for i < 10, Jio = 20, and 
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Up) = 21/32), t = 2, and a = 0.32 were used for the biased turbo code, 
the TDMP-based scheme, and the SIP-based scheme, respectively. 
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9. BER versus E h /N 0 for T c = 60, M 

10, Ji = 4 for i < I, Jio = 20, and g(D) = 
,(»p) 



: 12240 (N = 102), 

a + D 4 )/j2t 0 D\ 

The parameters |S P | = 5 (Pr(uj^* p> = u p ) = 21/32), r = 2, and a = 0.28 
were used for the biased turbo code, the TDMP-based scheme, and the SIP- 
based scheme, respectively. 



Fig. 9. The BER for the SIP-based scheme is indistinguishable 
from that for the TDMP-based scheme in the low-to-moderate 
SNR regime. The SIP-based scheme could outperform the 
TDMP-based scheme if the coherence time were longer. The 



bias-based scheme with \S U 



5 still achieves the best 



performance in the moderate SNR regime, although |<S p | =4 
is better in terms of the decoding threshold shown in Table II. 
This observation implies that the bias-based scheme with fixed 
|<S P | is robust against an increase of the coherence time. 
Figure 10 shows the BERs of the three schemes for T c — 20. 



We used the bias-based scheme with \S U 



8, since the bias- 



based scheme with |<S P | = 5 does not work for T c = 20. We 
find that the BER of the bias-based scheme with |5 P | = 8 
is slightly superior to that of the TDMP-based scheme in the 
moderate SNR regime. 



306), 



Fig. 10. BER versus E b /N 0 for T c = 20, M = 12240 (N 
I = 10, Ji = 4 for i < I, Jio = 20, and g(D) = (1 + D 4 )/ J2t=o ■ 
The parameters |5 P | = 8 (Pr(uj:" p) = m p ) = 3/4), r = 2, and a = 0.5 
were used for the biased turbo code, the TDMP-based scheme, and the SIP- 
based scheme, respectively. 



VI. Conclusions 

We have proposed biased CCs to be used as constituent 
codes of concatenated codes with interleaving. The biased 
turbo code consisting of two biased CCs is suitable for iterative 
LMMSE channel estimation and decoding. The magnitude 
of the bias provides a significant impact on the decoding 
threshold of the biased turbo code. The DE analysis allows us 
to design the magnitude of the bias according to the coherence 
time, in terms of the decoding threshold. Numerical simula- 
tions have shown that the bias-based scheme can outperform 
two pilot-based schemes in the moderate SNR regime, at the 
expense of a performance degradation in the high SNR regime. 

In this paper, the performance of the bias-based scheme 
has been analyzed in the moderate SNR regime. We could 
not investigate the impact of the subset itself S p on the 
performance in the high SNR regime, since it is hard to 
estimate the BER in that regime numerically. The performance 
analysis in the high SNR regime is left as a future work. 

Another interesting future work is an extension to higher- 
order modulation schemes. Since lower-order modulation is 
preferred for training, one would need to design biased con- 
stellation mapping appropriately. It may be worth combining 
biased signaling with constellation shaping. 

Appendix A 
Derivation of (13) and (14) 

Let us consider a general n-dimensional Gaussian SIMO 
channel with Gaussian input h ~ CJ\f(0, 1), 



y = ah + n e C", 



(42) 



with n ~ CAf(0, S). The goal of this appendix is to calculate 
the LMMSE estimator of h and its MSE. It is well known 
that the posterior estimator of h for this estimation problem 
is the LMMSE estimator of h, and that the posterior variance 
is equal to its MSE. Thus, we calculate the posterior pdf of u 
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conditioned on y and a, defined as 

p(y\a, h)p(h) 



p(h\y,a) 



with 



p(y\a,h) 



1 



TT™ det S 



p(h) 



p(y\a) 

-(y-ah) K S- 1 (y-ah) 
t ,-\h\ 2 



(43) 



(44) 



(45) 



Completing the square for the exponent of the numerator in 
(43) with respect to h, after some algebra, we obtain 

1 



p(h\y,a) 



(46) 



with a 2 = (1 + a H E _1 a) _1 . This expression implies that 
the LMMSE estimator h of h and the MSE E[\h - h\ 2 \a] are 
given by 



h 



1 



— a H S" 1 y, 



E[|h-/i| 2 |a] 



(47) 
(48) 



respectively. 

The LMMSE estimate (13) and the MSE (14) are obtained 
from these general formulas. For example, substitute a = 



(4?n» ■ ■ ■ . XtLV and S = dia S(^0 + 1 - l^'J 2 , ■ • • , JV 0 + 



(?) 



(0 12 



1 - l^g.J 2 ) *= 1- 
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